Humans are the only species on the planet that can produce enough energy to generate electromagnetic radiation from the visible spectrum that is visible from space. We create a tremendous amount of this energy by lighting the spaces we inhabit, work at, and travel through. These night lights are a distinctly human part of our planet; what, if anything, can they actually inform about humanity as a whole? Over the course of this lesson, we will be assessing if imagery captured by the VIIRS day/night band can provide us with a more nuanced understanding of the human population found at any given location. To do this we will be performing a series of spatial analyses to test the correlation between monthly averages of nighttime radiance captured by VIIRS and social economic factors conveyed by census data. We will be performing this evaluation over three counties in Texas, but the workflow will be possible anywhere census data is found, as VIIRS provides daily night light images of the entire Earth.
We are relying on a few different datasets for this tutorial. The first and most important one is monthly composites of nighttime radiance derived from the VIIRS day night band imagery and compiled by the Colorado School of Mines Earth Observation Group.
We’re also using county level data from Natural Earth.
Lastly, we will use census tract data of the American Community Survey 2015, which can be retrieved using the tidycensus R library.
You can find a more comprehensive description of the VIIRS monthly
composites by opening
/data/describeVIIRSData.html
within your ‘intermediateGeospatialR’ folder.
Data Structure
The monthly composites are delivered at the continental scale. Each month has two images associated with it. The first image contains an average surface radiance value for that given location. The second image contains the number of observations across the month that were used to generate the average monthly value presented in the radiance image.
Distribution of Radiance Values
In the example provided in the describeVIIRSData.html,
we can see that the distribution of radiance values is right skewed.
This means that the majority of the observations are at or near zero and
there are a few locations that significantly brighter. The is important
information to understand before starting the analysis as most of the
observations will tell us little more then it is dark at that
location.
A Bright Place
The Lux, or Sky Beam, on top of the Luxor Hotel in Vegas is the brightest object on earth at 42.3 billions candela. This beam is produced by 39,7000 watt lamps. When all lamps are operating, the interior temperature of the room in which they are contained raises to approximately 300 degrees Fahrenheit.
Number of Observations
The VIIRS day/night band cannot capture data from under clouds. Therefore, not every night provides information regarding the surface radiance at a given location. When working with the monthly composites, it is essential to consider how many observations went into the calculation of a mean radiance. This is particularly important with this sensor, as the view angle from which the image was captured affects the observed radiance (think about a building blocking light in a particular direction). We will be evaluating this effect in the second part of the lesson.
Processing Spatial Data
We are going to be working with sf
and terra
to process the imagery and polygons.
Census Data
tidycensus requires a personal access token to utilize
the API. As such, we are not going to be pulling data from tidycensus as
part of this lesson. However, the code showing how the data was pulled
is provided as supporting material (
src/section1/prepWorkshopFilePrep.R) if this is something
you need to do later on in your work.
Visualizing Spatial Data
We will be using the tmap library for quick
visualizations of these spatial features.
Data Wrangling and Summarizing
dplyr, tidyr and stringr are
great packages for efficient data wrangling, and are a part of the tidyverse, a
collection of R packages designed for data science.
source("installPackages.R")
We now have a function in our environment called
packageLoad(). With this function we can supply it with all
the packages we need for the lesson, and for each one it checks if it is
currently installed, installs it if it is not, and then loads all
packages into our session memory.
# load required libraries
packageLoad(c("sf", "terra", "dplyr", "tidyr", "stringr", "tmap"))
Since this lesson is in an R Project, your working directory should
be automatically set to the correct path (the directory your .Rproj file
is in). Double check that it is set to something that looks like:
~/Desktop/R_SC_Spatial/, depending where you saved the
folder from GitHub.
You could still be waiting for installations, which is fine.
If you are having issues with a specific package connect with a helper.
If everything seemed to work but you want to double check, use the base R function below to see what’s currently loaded into your session memory.
(.packages())
Imagery
We will pull our full list of images using the
list.files() function. This function grabs all the files
within a specific directory. We can filter for specific files based on a
text pattern. We will use .tif to grab all the rasters.
It’s essential here that we also indicate full names equals
TRUE. This will provide the full path to the images, which
we need to read them in.
# night light imagery
images <- list.files(path = "intermediateGeospatialR/data/nightLights",
pattern = ".tif",
full.names = TRUE)
Currently our images are just files paths, which is fine because we don’t need spatial objects just yet. There are two types of images for each month, count and radiance values. For this first lesson we just want the ‘counts’ data, which represents the number of observation days within that month.
# use the 'grep' function to filter out just the 'counts' image files
counts <- images[grepl(pattern = "_counts", x = images)]
How would you select the remaining images in the file folder excluding the “_counts” images (i.e., just the radiance files)? We will be using these radiance files for the rest of the lesson.
The are multiple ways to approach this.
# use the opposite operator to grab the other half. '!' filters anything that does NOT equal that pattern
radiance <- images[!grepl(pattern = "_counts", x = images)]
# select all paths that are not current in the counts file
radiance <- images[!images %in% counts]
# or select a pattern unique to the image set you need to grab
radiance <- images[grepl(pattern = "arc.tif", x = images)]
County Data
The county data has been prepped for us so we can read it in by
pointing to the .shp file which it is stored in.
## county locations
counties <- sf::read_sf(dsn = "intermediateGeospatialR/data/counties/countyTex.shp",quiet =TRUE)
# head(counties)
This data contains records for three counties in Texas.
Census Data
The census data for these three counties has be prepped for this
lesson. Reference the file prepWorkshopFilePrep.R to see how to
use the tidycensus library to call census data directly
through R.
The census data was saved as a spatial .shp file (the
census attributes were tied to each census tract within our counties of
interest), so we can read it in like we did the county data.
## census tract data
censusT <- sf::read_sf(dsn = "intermediateGeospatialR/data/census/ageAndPoverty.shp", quiet =TRUE)
# head(censusT)
With everything loaded, let’s plot the data and take a look at each one.
tmap::tmap_mode("view") # sets up the interactive map
# we need to read the raster in to visualize
tmap::qtm(terra::rast(radiance[1]))
Because the images object is just a vector of file paths
we need to call the rast function to read in one of the
images as a raster file to visual it. We do not need to store this as an
object at this point.
The map is not exciting because most of the state is dark and the
tmap library forces a generalization (reclassification) of
the data to visual large extents. The net effect of this is a very
boring map. We can examine the specific values of the raster to be sure
the data is valid.
Nesting Functions to Visualize All Unique Values
# read in raster, grab values, determine unique values, convert to df for visualization purposes, View the dataframe for visual inspection
View(data.frame(unique(values(terra::rast(radiance[1])))))
There is data there, so let’s move on.
Visualize the county data
# there are three counties of interest
tmap::qtm(counties)
Visualize Census Tract data
We are working at the census tract level. Each census track has on average 4,000 people in it. The population of Houston is about 2.3 million, so we can expect a lot of rows in this dataset. Add to that the fact that each census tract is likely to have a fairly complex geometry, which requires storing a lot of coordinates. So, lets understand more about this dataset before we try to plot it.
# By printing the object we can see some metadata and the first 10 rows in the console
censusT
## Simple feature collection with 2406 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -98.80655 ymin: 28.82557 xmax: -94.9085 ymax: 30.1705
## Geodetic CRS: NAD83
## # A tibble: 2,406 x 6
## GEOID NAME variable estimate moe geometry
## <chr> <chr> <chr> <dbl> <dbl> <MULTIPOLYGON [°]>
## 1 48029190601 Census T~ B01002_~ 31.3 3.7 (((-98.52602 29.46795, -98.52~
## 2 48029190601 Census T~ B17001_~ 617 266 (((-98.52602 29.46795, -98.52~
## 3 48201311900 Census T~ B01002_~ 35.7 3.5 (((-95.33319 29.72589, -95.32~
## 4 48201311900 Census T~ B17001_~ 403 166 (((-95.33319 29.72589, -95.32~
## 5 48201450200 Census T~ B01002_~ 39.6 2.4 (((-95.59029 29.78492, -95.57~
## 6 48201450200 Census T~ B17001_~ 63 50 (((-95.59029 29.78492, -95.57~
## 7 48201450400 Census T~ B01002_~ 28.8 8.7 (((-95.62377 29.78472, -95.62~
## 8 48201450400 Census T~ B17001_~ 830 1069 (((-95.62377 29.78472, -95.62~
## 9 48201554502 Census T~ B01002_~ 40.6 2.5 (((-95.65325 29.99056, -95.65~
## 10 48201554502 Census T~ B17001_~ 203 170 (((-95.65325 29.99056, -95.65~
## # ... with 2,396 more rows
#the 'str' base R function gives us more details on the structure of the dataset
str(censusT)
## sf [2,406 x 6] (S3: sf/tbl_df/tbl/data.frame)
## $ GEOID : chr [1:2406] "48029190601" "48029190601" "48201311900" "48201311900" ...
## $ NAME : chr [1:2406] "Census Tract 1906.01, Bexar County, Texas" "Census Tract 1906.01, Bexar County, Texas" "Census Tract 3119, Harris County, Texas" "Census Tract 3119, Harris County, Texas" ...
## $ variable: chr [1:2406] "B01002_001" "B17001_002" "B01002_001" "B17001_002" ...
## $ estimate: num [1:2406] 31.3 617 35.7 403 39.6 63 28.8 830 40.6 203 ...
## $ moe : num [1:2406] 3.7 266 3.5 166 2.4 ...
## $ geometry:sfc_MULTIPOLYGON of length 2406; first list element: List of 1
## ..$ :List of 1
## .. ..$ : num [1:14, 1:2] -98.5 -98.5 -98.5 -98.5 -98.5 ...
## ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA
## ..- attr(*, "names")= chr [1:5] "GEOID" "NAME" "variable" "estimate" ...
You can also inspect the spatial object as a dataframe with
View, which allows you to sort and filter the data
View(censusT)
Let’s plot these census tracts now. Note that you can click on a census tract to view a pop-up of its attributes.
tmap::qtm(censusT)
Probably the most important part of working with spatial data is the coordinate reference system (CRS) that is used. In order to analyze and visualize spatial data, all objects must be in the exact same CRS.
Therefore, before we move on lets check the CRS of our raster and vector files: The data looks good, so let’s check the details.
# All the raster images were created using the same methodology so we can check the CRS of just one of them
terra::rast(images[1])
## class : SpatRaster
## dimensions : 650, 798, 1 (nrow, ncol, nlyr)
## resolution : 0.01666667, 0.01666667 (x, y)
## extent : -106.7271, -93.42708, 25.75208, 36.58542 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : april_10arc.tif
## name : april_10arc
## min value : 0.08749756
## max value : 4220.476
#we will use this rasters' propoerties later so let's save it as a variable
temp <- terra::rast(images[1])
This dataset is in WGS84 (longitude/latitude).
# print to view attributes
censusT
## Simple feature collection with 2406 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -98.80655 ymin: 28.82557 xmax: -94.9085 ymax: 30.1705
## Geodetic CRS: NAD83
## # A tibble: 2,406 x 6
## GEOID NAME variable estimate moe geometry
## <chr> <chr> <chr> <dbl> <dbl> <MULTIPOLYGON [°]>
## 1 48029190601 Census T~ B01002_~ 31.3 3.7 (((-98.52602 29.46795, -98.52~
## 2 48029190601 Census T~ B17001_~ 617 266 (((-98.52602 29.46795, -98.52~
## 3 48201311900 Census T~ B01002_~ 35.7 3.5 (((-95.33319 29.72589, -95.32~
## 4 48201311900 Census T~ B17001_~ 403 166 (((-95.33319 29.72589, -95.32~
## 5 48201450200 Census T~ B01002_~ 39.6 2.4 (((-95.59029 29.78492, -95.57~
## 6 48201450200 Census T~ B17001_~ 63 50 (((-95.59029 29.78492, -95.57~
## 7 48201450400 Census T~ B01002_~ 28.8 8.7 (((-95.62377 29.78472, -95.62~
## 8 48201450400 Census T~ B17001_~ 830 1069 (((-95.62377 29.78472, -95.62~
## 9 48201554502 Census T~ B01002_~ 40.6 2.5 (((-95.65325 29.99056, -95.65~
## 10 48201554502 Census T~ B17001_~ 203 170 (((-95.65325 29.99056, -95.65~
## # ... with 2,396 more rows
This dataset is in NAD83.
# print to view attributes
counties
## Simple feature collection with 3 features and 17 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -98.80655 ymin: 28.76484 xmax: -94.90849 ymax: 30.17061
## Geodetic CRS: NAD83
## # A tibble: 3 x 18
## STATEFP COUNTYFP COUNTYNS GEOID NAME NAMELSAD LSAD CLASSFP MTFCC CSAFP
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 48 029 01383800 48029 Bexar Bexar Coun~ 06 H1 G4020 <NA>
## 2 48 201 01383886 48201 Harris Harris Cou~ 06 H1 G4020 288
## 3 48 039 01383805 48039 Brazoria Brazoria C~ 06 H1 G4020 288
## # ... with 8 more variables: CBSAFP <chr>, METDIVFP <chr>, FUNCSTAT <chr>,
## # ALAND <dbl>, AWATER <dbl>, INTPTLAT <chr>, INTPTLON <chr>,
## # geometry <POLYGON [°]>
This dataset is also in NAD83.
Matching the CRS
The vector and raster datasets are in a different CRS: WGS84 and NAD83. We need to correct that before we do any spatial analysis.
What coordinate reference system you choose to use is up to you, but
for this lesson we are going to take our two sf features
and project them to WGS84. The reason for this is twofold. First, there
are fewer features to reproject going in this direction. Second, when
you’re reprojecting raster data, there is the potential for resampling
to occur.
It’s probably not going to be an issue with WGS84 to NAD83 in the United States, but since the raster data is our primary dataset we want to avoid altering it any more than we need to.
To transform vector data, we can use the st_transform()
function, supplying it with the CRS of the raster files which can be
retrieved using the crs() function
# reproject datasets to match the CRS of our raster object
counties <- sf::st_transform(x = counties, crs = crs(temp))
censusT <- sf::st_transform(x = censusT, crs = crs(temp))
Test the transformation
Let’s double-check that all our vector and raster data are in the
same CRS so we can move forward. We can use the st_crs()
function from the sf package, which can also retrieve CRS
info from terra raster objects
# test for CRS match
st_crs(counties) == st_crs(temp)
## [1] TRUE
st_crs(censusT) == st_crs(temp)
## [1] TRUE
Great, both return ‘TRUE’.
With the reprojection taken care of, let’s drop the ‘temp’ raster object to keep our working directory clean and memory open.
# remove the raster object to keep the working directory clean and free up working memory
rm(temp)
The imagery is still at the state level. We need to process it down to the county level so that we can associate it with our census tract data. Working with the smallest geographic extent you can is always going to save you time and hopefully headaches as it is less computationally intensive.
Vector data with terra
So far we have just been using properties of each raster and vector
file to perform operations, however to conduct spatial analysis there is
one more step we need to take to make these two types of spatial data
compatible. The terra package uses a special form of vector
data called SpatVect objects instead of sf
objects. Luckily, they have made it relatively easy to convert
sf objects to SpatVect objects using the
vect() function. We will do this first before we build our
workflow.
# convert sf objects to spatvect
counties <- vect(counties)
censusT <- vect(censusT)
The truth is, when you pick up a new workflow (like this one), you’re not going to get it right the first time.
So, we recommend breaking down the process to the minimum number of features to get that workflow working first, then you can start thinking about iterating the process.
Start Small
### figure out the process with one feature
# create a raster from one image file path
r1 <- terra::rast(radiance[1])
### crop the image to one of the counties
# select county of interest (each row is a unique county)
c1 <- counties[1, ]
# crop the raster to the county extent
r2 <- terra::crop(x = r1, y = c1)
# mask the image (only keep cells that overlap with the county polygon)
r3 <- terra::mask(x = r2, mask = c1)
# take a look to see if this worked
qtm(r3)
The basic workflow consists of indexing the images and the counties.
We then call the crop() and mask() functions
from the terra library. The result is a raster that has
been reduced to the boundary of the county of interest.
Streamline the Workflow
Now that we know that process works, let’s try to streamline this
process a little bit so that we are not creating so many variables in
our environment. We will utilize dplyr piping structure.
The %>% pipe is a custom operator that takes the output
of one function and places it as an input into another function. This
operator allows you to connect functions from output to input, output to
input, without creating intermediate variables, thus resulting in more
efficient code.
### select county of interest
c1 <- counties[1, ]
## read, crop and mask the first raster image
r1 <- terra::rast(radiance[1])%>%
terra::crop(y = c1)%>%
terra::mask(mask = c1)
qtm(r1)
We’ve accomplished the same result, but this time we’ve only declared
two variables. This adds clarity to our code and is generally a good
practice to follow.
Make it a Function
The workflow we’ve created above requires two inputs and produces a single output. To make this code more transferable, we can build it out as a function. This optimizes the code for reuse and will allow us to efficiently apply the process to multiple features.
clipMask <- function(path, extent){
## path is a full file path to a raster object
## extent is a spatial object (in SpatVect format) which will be used to clip and mask the raster
## returns a raster object that is clipped and masked to the extent object
r1 <- terra::rast(path) %>%
terra::crop(y = extent) %>%
terra::mask(mask = extent)
return(r1)
}
When we write up the function, we’re inherently abstracting some of the components. So we’re no longer indexing from the county object to pull a specific county. Instead, we’re just saying we want an extent object and we want a raster. Counter this abstraction by putting documentation right at the start of your function so people can understand what it is you’re asking for.
Applying the Function to All Images
In the data folder
/intermediateGeospatialR/data/nightLights we have radiance
imagery for 12 months in 2019. We also have three counties of interest
that we’re going to be working in. If we want radiance for each county
for each month, we need to create 36 images.
We could hard code to call our function 36 times.
im1 <- clipMask(path = radiance[1], extent = counties[1,])
im2 <- clipMask(path = radiance[2], extent = counties[1,])
## so on an so one
While this will definitely work, it is very repetitive. If we had hundreds of counties instead of 3 this would take forever to write. This is where iterating with ‘for loops’ comes into play.
Loop the Loop
We can define all 33 features much more efficiently by utilizing for loops.
For loops (think “for each feature in the list”) are a great tool for doing the same operation multiple times.
When developing a for loop operation, it is helpful to write out the general structure first and then assign your variables to test the process before you let the loop run. If the loop works once, it should (ideally) work every time. For this example we expect a single image to return from each iteration of the loop.
### outline the structure you want
# for each county - return an image for each month
## select the county of interest
# for each raster
## select a specific month
## clipMask()
## save the processed image
Take a moment to fill in the code using the above outline.
# for each county - return an image for each month
for(i in seq_along(counties)){
c1 <- counties[i,] # select a specific county
for(j in seq_along(radiance)){
## pull a specific month's image
p1 <- radiance[j]
## clipMask()
r2 <- clipMask(path = p1, extent = c1)
## save the processed image
# we will come back to this
}
}
So it looks okay, but let’s test it on just a single iteration before we run the entire loop on all features.
### define i and j so the loop only runs for the first features
i <- 1
j <- 1
###
#comment out the loop part since we are just running once
#for(i in seq_along(counties$STATEFP)){
c1 <- counties[i,] # select a specific county
# for(j in seq_along(images)){
## pull a specific month's image
# p1 <- images[j]
## clipMask()
r2 <- clipMask(path = radiance[j], extent = c1)
## save the processed image
# we will come back to this
# }
#}
### print the output to check results
r2
## class : SpatRaster
## dimensions : 39, 42, 1 (nrow, ncol, nlyr)
## resolution : 0.01666667, 0.01666667 (x, y)
## extent : -98.81041, -98.11041, 29.11875, 29.76875 (xmin, xmax, ymin, ymax)
## coord. ref. : lon/lat WGS 84 (EPSG:4326)
## source : memory
## name : april_10arc
## min value : 0.425001
## max value : 116.5439
The slimmed down loop worked as we expected. It returned a processed image, but we haven’t figured out what to do with the processed image yet.
Saving the Output From the Loop
If we were to run the looping operation as is, it would process the
36 images, but we would end up with only a single feature at the end.
That is because the variable r2 is being rewritten every
time the loop repeats itself. We could get crafty and store all these
images in memory using a list(), or we can write out the
files and pull them back in as needed. There are pros and cons for both,
but today we will be writing out the imagery files.
See the code below if you wanted to store it all in memory as a list.
# create empty list
processImages <- list()
for(i in seq_along(counties)){
c1 <- counties[i,] # select a specific county
list1 <- list()
for(j in seq_along(radiance)){
## pull a specific month's image
# p1 <- images[j]
## clipMask()
list1[[j]]<- clipMask(path = radiance[j], extent = c1)
}
processImages[[i]] <- list1
rm(list1)
}
processImages
Growing a list like this is pretty inefficient and indexing can be a lot to keep track of. Still, there are cases when working in memory is the best option.
Develop a File Structure
If we want to save the processed images, we need to first create a
directory in which we can store all the images for each county. We can
do this using a similar loop structure as the one above. By
concatenating characters using the paste0() function we can
build out paths for new files.
# create new file directories for processed imagery
for(i in unique(counties$NAME)){ # looping over names not indices
print(i)
location <- paste0("data/nightLights/", as.character(i)) # create path for file
if(!file.exists(location)){ # test if this folder already exists
dir.create(path = location) # if it doesn't exist, create the folder
}
}
With the file structure in place, we can run the process and write out the rasters by constructing a unique path and file name for a given month at a specific location.
We can use names(r1) in this case because these images
were named when they were prepped for this lesson. You can check out how
this was done in the script prepWorkshopFilePrep.R. That
process is still dependent on initial file management actions shown here.
All this is to say it is important to think about your folder structure
when creating material as it can have long lasting effects on your
workflow.
# for each county - return an image for each month
for(i in seq_along(counties)){
print(i)
c1 <- counties[i,] # select a specific county
for(j in seq_along(radiance)){
## pull a specific month's image
p1 <- radiance[j]
## clipMask()
r2 <- clipMask(path = p1, extent = c1)
## save the processed image
countyName <- as.character(c1$NAME) # define county name
rasterName <- names(r2) # define raster name
file <- paste0("intermediateGeospatialR/data/nightLights/",countyName, "/",rasterName,".tif") # concatenate features to create a path
terra::writeRaster(x = r2 ,
filename = file,
overwrite = TRUE)
}
}
Room for Improvement?
So this does work well, but looking back at the clipMask
function we can see that it reads in the same raster multiple times
during the process.
clipMask
## function(path, extent){
## ## path is a full file path to a raster object
## ## extent is a spatial object (in SpatVect format) which will be used to clip and mask the raster
## ## returns a raster object that is clipped and masked to the extent object
## r1 <- terra::rast(path) %>%
## terra::crop(y = extent) %>%
## terra::mask(mask = extent)
## return(r1)
## }
Since we iterate over the counties first, we have to read in each monthly raster three times, once for each county. That’s not that big of a deal, because these rasters are relatively small images, but if they were huge, we would be waiting three times more than we need to.
Let’s adapt the function so that we can rework the workflow and only read in monthly images once.
Reduce the Number of Raster Reads by 3
We wrote the first clip mask function in the script that we are using
to run the code, but you can also write functions as standalone scripts
and source() them into your code. That way, you can keep
your primary work flow clean and still have your functions to rely on.
So let’s source this new function (saved as a script in the intermediate
lesson folder) and take a look at it.
source("intermediateGeospatialR/src/section1/clipMask2.R")
print(clipMask2)
## function (raster, extent)
## {
## if (terra::ext(raster) < terra::ext(extent)) {
## print("The raster may be smaller then the extent object")
## }
## if (terra::crs(raster) != terra::crs(extent)) {
## return("The crs of the objects to not overlap")
## }
## else {
## return(raster %>% terra::crop(y = extent) %>% terra::mask(mask = extent))
## }
## }
The output of this function is the same as our current one, but there are three major differences.
1. Input a raster object, not a path to an image.
function(raster, extent){
# raster : a raster object
# extent : a spatial feature or extent object
}
Rather then reading the raster file in within the function, we are passing a raster object to the function. This is the key step for reducing the number of times we need to read in data within our workflow.
2. Testing the extent of the objects.
We build in a condition that tests if the raster is larger than the feature it is being clipped and masked to. This is a just a concept check, but a great one if you’re planning on using this workflow in multiple different projects.
if(terra::ext(raster) < terra::ext(extent)){
print("The raster may be smaller then the extent object")
}
3. Testing the CRS of the objects.
If the objects are not in the same CRS we cannot perform the spatial analysis.
if(terra::crs(raster) != terra::crs(extent)){
return("The crs of the objects to not overlap")
}else{
return(raster %>%
terra::crop(y = extent) %>%
terra::mask(mask = extent))
}
Restructure the Workflow for the New Function
As the clipMask2() function requires a raster object as
the input, we need to restructure our for loops to account for this
change. The goal here is to limit the number of times we need to read in
the raster objects, so let’s start by looping over all twelve
images.
for(i in seq_along(radiance)){
r1 <- terra::rast(radiance[i]) # read in the raster
nameR <- names(r1) # save the name as a variable to use in the file structure later
}
We still need our extent object to run the function, so let’s next loop over that feature.
for(j in seq_along(counties)){
c1 <- counties[j,] # select a specific county
nameC <- as.character(c1$NAME) # grab the name of the county
r2 <- clipMask2(raster = r1, extent = c1) # call the function with input from the loop above
### we've already written this content out so we don't need to repeat the process
# raster::writeRaster(x = r2,filename = paste0(baseDir, "/data/nightLights/", nameC,"/",nameR,".tif"))
}
Now we can combine the loops.
# loop over images
for(i in seq_along(radiance)){
r1 <- terra::rast(radiance[i])
nameR <- names(r1)
# loop over counties
for(j in seq_along(counties)){
c1 <- counties[j,]
nameC <- as.character(c1$NAME)
r2 <- clipMask2(raster = r1, extent = c1)
### we've already writen this content out so we don't need to repeat the process.
# raster::writeRaster(x = r2,filename = paste0"intermediateGeospatialR/data/nightLights/", nameC,"/",nameR,".tif"))
}
}
This is great. We get to the same result but using a different path that is more efficient because we are reading in less data.
Closing Thoughts
We wanted to walk through this as an iterative process because this is how creating a workflow actually works. You don’t get it right the first time, and you usually don’t get it right the second or third time. Good code is good because you come back to it, you alter it, and you make it better over time.
The rasters are prepped for 12 months for every county of interest. Now we need to work on our census data so that we can associate average nighttime radiance with each one of these census tracks.
head(censusT)
## GEOID NAME variable estimate
## 1 48029190601 Census Tract 1906.01, Bexar County, Texas B01002_001 31.3
## 2 48029190601 Census Tract 1906.01, Bexar County, Texas B17001_002 617.0
## 3 48201311900 Census Tract 3119, Harris County, Texas B01002_001 35.7
## 4 48201311900 Census Tract 3119, Harris County, Texas B17001_002 403.0
## 5 48201450200 Census Tract 4502, Harris County, Texas B01002_001 39.6
## 6 48201450200 Census Tract 4502, Harris County, Texas B17001_002 63.0
## moe
## 1 3.7
## 2 266.0
## 3 3.5
## 4 166.0
## 5 2.4
## 6 50.0
dim(censusT)
## [1] 2406 5
length(unique(censusT$GEOID))
## [1] 1203
Restructuring the Census Data
Each census tract has two variables associated with it. Because of how we pulled the data, these appear as separate rows, which duplicates all of our geometries. One way to fix this would be to filter by our two variables, creating two separate datasets, specify variable names and then join them back together.
# Grab unique variables.
vals <- unique(censusT$variable)
### B01002_001 == median age
### B17001_002 == poverty
Say you wanted to filter out just the median age data, whose variable ID is ‘B01002_001’, how would you do that?
There are a few ways to approach it using base R or
dplyr.
#Base R
c1 <- censusT[censusT$variable == 'B01002_001', ]
# dplyr
c1 <- st_as_sf(censusT) %>% #dplyr only works with 'sf' objects, so we must first convert our 'spatvect' object
dplyr::filter(variable == 'B01002_001')
There is a more efficient way we can clean this data that does not
involve creating intermediate datasets. This would be to transpose the
data, which can be done in the tidyr package with the
pivot_wider() and pivot_longer()
functions.
censusData <- st_as_sf(censusT) %>% # convert back to sf to perform dplyr and tidyr operations
as.data.frame() %>% # treat as data frame to perform tidyverse functions
tidyr::pivot_wider(names_from = variable, values_from = c('estimate', 'moe')) %>%
dplyr::rename(medianAge_estimate = estimate_B01002_001, medianAge_moe = moe_B01002_001,
poverty_estimate = estimate_B17001_002, poverty_moe = moe_B17001_002) %>% #rename columns to something more reader-friendly
st_as_sf() #convert back to sf object
qtm(censusData)
Now we have a cleaned dataset with a single row for each GEOID and mroe descriptive variable names. When we click on a census tract we can see the values for median age and poverty in the pop-up.
Our goal is to end up with a single measure of radiance per census tract per month. This will require some iterative processes, so just like before when we were developing the method for processing the rasters, we want to create a test set first and make sure we get our methodology set before scaling up to an iterative workflow.
## create a subset to test the process.
tempr <- terra::rast("intermediateGeospatialR/data/nightLights/Harris/june_10arc.tif")
head(censusData) # look for harris county locations (to match our temp raster)
## Simple feature collection with 6 features and 6 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -98.52602 ymin: 29.46644 xmax: -95.31069 ymax: 30.00429
## Geodetic CRS: WGS 84
## # A tibble: 6 x 7
## GEOID NAME geometry medianAge_estim~ poverty_estimate
## <chr> <chr> <POLYGON [°]> <dbl> <dbl>
## 1 48029190601 Censu~ ((-98.52602 29.46795, -9~ 31.3 617
## 2 48201311900 Censu~ ((-95.33319 29.72589, -9~ 35.7 403
## 3 48201450200 Censu~ ((-95.59029 29.78492, -9~ 39.6 63
## 4 48201450400 Censu~ ((-95.62377 29.78472, -9~ 28.8 830
## 5 48201554502 Censu~ ((-95.65325 29.99056, -9~ 40.6 203
## 6 48201550603 Censu~ ((-95.48031 29.96432, -9~ 30.3 1404
## # ... with 2 more variables: medianAge_moe <dbl>, poverty_moe <dbl>
# pull a subset
tempc <- censusData[2:4,]
Extract Values to Polygons
The terra::extract() function is a marvelous tool. We
can pass a raster and a spatial object and it will pull the values from
the raster that intersect with the spatial object.
The function returns all extracted values as a numeric vector tied to an ‘ID’ related to the spatial features
# extract values to a vector
terra::extract(x = tempr, y = vect(tempc)) ## remember sf objects must be converted to `spatvect` objects to use in terra functions
Since there are multiple values for each census tract, we can call a
summarizing function within the extract() function. This is
great when you are looking for a single value like the mean.
## average the radiance values within each spatial feature
tempc$june <- terra::extract(x = tempr, y = vect(tempc), fun = mean)
Comparing Census Data to Radiance
Our census data is across three counties. Since our workflow is per county, we will need to subset our census tract data by county name.
View(censusData)
We can do this using filter() from the
dplyr package and str_detect() from the
stringr package, which detects specific character
strings.
censusHarris <- censusData %>%
filter(str_detect(NAME, 'Harris'))
qtm(censusHarris)
We now have only the census tracts associated with the specific county.
Let’s outline what needs to happen.
# for each county
## select census tracts of interest
## pull all imagery associated with that county
# for each image
## extract the radiance values for each census tract
## store that data
The first loop will look very similar to what we have done previously.
# loop over counties
for(i in seq_along(counties)){
# get county name
cName <- as.character(counties$NAME[i])
## subset the census data connected to a specific county
c1 <- censusData %>%
filter(str_detect(NAME, cName))
# construct a file directory to show where to look for specific county images
dir <- paste0("intermediateGeospatialR/data/nightLights/",cName)
## pull all rasters from a county of interest
rasters <- list.files(path = dir, pattern = ".tif", full.names = TRUE)
}
We have our images and filtered census tract data so let’s work on the second loop.
for(j in seq_along(rasters)){
print(j)
r2 <- terra::rast(rasters[j]) # read in image
n1 <- names(r2)
# assigning column based on raster name and calculate mean value per area
c1[, n1] <- terra::extract(x = r2, y = vect(c1), fun = mean)[,2] # extract returns a matrix, so we are indexing the column so we can store the data as a vector within the sf object.
}
Since this workflow is creating 3 dataframes (one for each county), we need to store those at the end of each iteration as ‘c1’ becomes overwritten with the new county data at the beginning of the next iteration. A good way to do this is to save each output as an element in a list, and then you can combine elements of the list into a single dataframe afterwards.
# create an empty list
df <- list()
Now we can add all the sections together.
# loop over counties
for(i in seq_along(counties)){
cName <- as.character(counties$NAME[i])
## subset the census data connected to a specific county
c1 <- censusData %>%
filter(str_detect(NAME, cName))
# construct a file directory to show where to look for specific county images
dir <- paste0("intermediateGeospatialR/data/nightLights/",cName)
## pull all rasters from a county of interest
rasters <- list.files(path = dir, pattern = ".tif", full.names = TRUE)
for(j in seq_along(rasters)){
print(j)
r2 <- terra::rast(rasters[j]) # read in image
n1 <- names(r2)
# assigning column based on raster name and calculate mean value per area
c1[, n1] <- terra::extract(x = r2, y = vect(c1), fun = mean)[,2] # extract returns a matrix, so we are indexing the column so we can store the data as a vector within the sf object.
}
## condition to compile the datasets, outside of the "j" loop
df[[i]] <- c1
}
# combine the data for each county into a single dataframe
df <- bind_rows(df)
# check the output
# View(df)
Disheartened by the weak relationships present in your revolutionary study, maybe we should have first considered the quality of our raw data. If we go back to the beginning of the lesson, we know that cloud cover affects the number of observations within a given month. It’s possible that we should exclude some months from the analysis because of the limited number of observations that went into the mean. Let’s first look at the variance of the average monthly values for each location.
# use rowise() and c_across() from dplyr to calculate standard deviation
df <- df %>%
as.data.frame() %>%
rowwise() %>%
mutate(varianceRadiance = sd(c_across(april_10arc:september_10arc), na.rm = TRUE)) %>%
st_as_sf()
This gives us enough information to suggest it’s worthwhile to evaluate the quality of the monthly images before going back to look at potential correlations. We’ll be looking at this in the next lesson.